The SIMT (Single-Instruction, Multiple-Thread) model is the heartbeat of GPU architecture. While you program individual threads, the hardware orchestrates them into a two-level hierarchy of grids and blocks. To maximize efficiency, the hardware further partitions these blocks into 32-thread units called warps.
1. SIMT vs. SIMD
Unlike CPU SIMD (like SSE/AVX) where you manually pack data into registers, SIMT allows threads to appear independent. The hardware automatically groups threads into warps, fetching one instruction for all 32 threads to execute in lockstep.
2. Linearization Rule
Programmers use threadIdx.x, y, z for logic, but the hardware flattens this into a 1D sequence for scheduling:
Because the x-dimension is the fastest-varying index, threads with consecutive threadIdx.x values usually land in the same warp, which is critical for memory coalescing.